7 research outputs found

    ILU Smoothers for AMG with Scaled Triangular Factors

    Full text link
    ILU smoothers are effective in the algebraic multigrid (AMG) V-cycle for reducing high-frequency components of the residual error. However, direct triangular solves are comparatively slow on GPUs. Previous work by Chow and Patel (2015) and Antz et al. (2015) demonstrated the advantages of Jacobi relaxation as an alternative. Depending on the threshold and fill-level parameters chosen, the factors are highly non-normal and Jacobi is unlikely to converge in a low number of iterations. The Ruiz algorithm applies row or row/column scaling to U in order to reduce the departure from normality. The inherently sequential solve is replaced with a Richardson iteration. There are several advantages beyond the lower compute time. Scaling is performed locally for a diagonal block of the global matrix because it is applied directly to the factor. An ILUT Schur complement smoother maintains a constant GMRES iteration count as the number of MPI ranks increases and thus parallel strong-scaling is improved. The new algorithms are included in hypre, and achieve improved time to solution for several Exascale applications, including the Nalu-Wind and PeleLM pressure solvers. For large problem sizes, GMRES+AMG with iterative triangular solves execute at least five times faster than with direct on massively-parallel GPUs.Comment: v2 updated citation information; v3 updated results; v4 abstract updated, new results added; v5 new experimental analysis and results adde

    GPU-resident sparse direct linear solvers for alternating current optimal power flow analysis

    Get PDF
    Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of the overall computation time. In this paper, we present our work on sparse linear solvers that take advantage of hardware accelerators, such as graphical processing units (GPUs), and improve the overall performance when used within economic dispatch computations. We treat the problems as sparse, which allows for faster execution but also makes the implementation of numerical methods more challenging. We present the first GPU-native sparse direct solver that can execute on both AMD and NVIDIA GPUs. We demonstrate significant performance improvements when using high-performance linear solvers within alternating current optimal power flow (ACOPF) analysis. Furthermore, we demonstrate the feasibility of getting significant performance improvements by executing the entire computation on GPU-based hardware. Finally, we identify outstanding research issues and opportunities for even better utilization of heterogeneous systems, including those equipped with GPUs

    GPU-Resident Sparse Direct Linear Solvers for Alternating Current Optimal Power Flow Analysis

    Full text link
    Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of the overall computation time. In this paper, we present our work on sparse linear solvers that take advantage of hardware accelerators, such as graphical processing units (GPUs), and improve the overall performance when used within economic dispatch computations. We treat the problems as sparse, which allows for faster execution but also makes the implementation of numerical methods more challenging. We present the first GPU-native sparse direct solver that can execute on both AMD and NVIDIA GPUs. We demonstrate significant performance improvements when using high-performance linear solvers within alternating current optimal power flow (ACOPF) analysis. Furthermore, we demonstrate the feasibility of getting significant performance improvements by executing the entire computation on GPU-based hardware. Finally, we identify outstanding research issues and opportunities for even better utilization of heterogeneous systems, including those equipped with GPUs

    Scalability of high-performance PDE solvers

    No full text
    Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large-scale numerical solvers for PDEs that govern a wide range of physical applications. We consider a sequence of PDE- motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms. We measure peak performance (degrees of freedom per second) on a fixed number of nodes and identify effective code optimization strategies for each architecture. In addition to peak performance, we identify the minimum time to solution at 80% parallel efficiency. The performance analysis is based on spectral and p-type finite elements but is equally applicable to a broad spectrum of numerical PDE discretizations, including finite difference, finite volume, and h-type finite elements.Comment: 25 pages, 54 figure
    corecore